Exo: Atomic Broadcast for the Rack-Scale Computer
نویسندگان
چکیده
Agreement is a crucial component of many distributed systems. It is the feature that is at the center of critical algorithms that provide consensus, election, and failure detection, among others. Intuitively, agreement between processes is possible only in the presence of uninterrupted or well ordered operations. One such powerful primitive is atomic, or ‘total order,’ broadcast. The way in which atomic broadcasts are implemented depends on the underlying communications infrastructure. On a single device atomicity is relatively unambiguous, often due to hardware support. For example, single processors can enforce atomic reads and writes to shared memory. In multiprocessors, cache coherency is maintained by way of distributed MESI/MOESI bus protocols or centralised directory-based schemes. Atomicity within these environments is facilitated by the presence of special purpose, low latency, highly reliable interconnects. Consequently, atomic operations inside a single machine are fast, and can be completed in just a few CPU cycles. Across devices few assumptions can be made: The communication infrastructure that connects devices is general purpose, higher latency and less reliable. Given the absence of hardware support, agreement is reached by way of software state replication and consistency algorithms such as Paxos[2] and Zookeeper (Zab)[5]. These are effective but complex solutions. As a consequence, atomic broadcast via software is slow. Previous studies have shown that atomic broadcasts can take milliseconds to complete [4]. The racks-scale computer (RSC) falls somewhere between these two worlds. On the one hand, we would like to be able to program the RSC as if it were a single multi-processor machine, with hardware supported fast atomic primitives. On the other, we would like for individual components in this machine to be able to fail without affecting the operation of the machine as a whole. Our work is motivated by this apparent contradiction, and the observation that closely co-located devices in RSCs present an opportunity to re-envision network support for distributed operations. In response we are engineering Exo, a fast and efficient network architecture and protocol for atomic broadcasts at the rack scale. Exo employs a special purpose network, constructed from general purpose Ethernet networking components. The Exo physical infrastructure comprises a broadcast/aggregate network similar to Hubnet [3] shown in Figure 1. We envisage Exo network as one of many (potentially special purpose) networks present in the RSC. Logically, Exo implements a token ring protocol similar to Totem [1]. Token rings are well a understood mechanism for building atomic broadcast systems. They are cheap to build, and run at predictably high speeds. The Exo protoFigure 1: Physical Exo architecture.
منابع مشابه
Retrieval–travel-time model for free-fall-flow-rack automated storage and retrieval system
Automated storage and retrieval systems (AS/RSs) are material handling systems that are frequently used in manufacturing and distribution centers. The modelling of the retrieval–travel time of an AS/RS (expected product delivery time) is practically important, because it allows us to evaluate and improve the system throughput. The free-fall-flow-rack AS/RS has emerged as a new technology for dr...
متن کاملR2C2: A Network Stack for Rack-scale Computers – Public Review
For modern Internet data centers, a key challenge to meeting the immense compute, storage, and networking needs of next-generation applications is the ability of the underlying infrastructure to scale. An important new trend in this direction is the introduction of socalled “rack scale” computers, which are large numbers of tightly integrated systems-on-chip (SoC) processors, interconnected wit...
متن کاملRethinking the Network Stack for Rack-scale Computers
The rack is increasingly replacing individual servers as the basic building block of modern data centers. Future rack-scale computers will comprise a large number of tightly integrated systems-on-chip, interconnected by a switch-less internal fabric. This design enables thousands of cores per rack and provides high bandwidth for rack-scale applications. Most of the benefits promised by these ne...
متن کاملArchitecture Support for Concurrency Control in Datacenters
Modern large-scale applications are highly concurrent and require efficient concurrency control mechanisms to achieve high performance, while preserving consistency of the data that is shared between a large number of servers. Traditional software techniques, such as atomic operations, used in concurrency control mechanisms introduce considerable overheads, whereas mechanisms leveraging archite...
متن کاملAdaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کامل